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Abstract. This paper initiates the study of the impact of failures on 
the fundamental problem of information spreading in the Vertex-Congest 
model, in which in every round, each of the n nodes sends the same 
0(logn)-bit message to all of its neighbors. 

Our contribution to coping with failures is twofold. First, we prove that 
the randomized algorithm which chooses uniformly at random the next 
message to forward is slow, requiring Q(n/\/k ) rounds on some graphs, 
which we denote by G n ,k, where k is the vertex-connectivity. 

Second, we design a randomized algorithm that makes dynamic mes¬ 
sage choices, with probabilities that change over the execution. We prove 
that for G n ,k it requires only a near-optimal number of 0(n log 3 n/k) 
rounds, despite a rate of q = 0(k/n log 3 n) failures per round. Our tech¬ 
nique of choosing probabilities that change according to the execution is 
of independent interest. 

Keywords: distributed computing, information spreading, randomized 
algorithms, vertex-connectivity, fault tolerance 


1 Introduction 


Coping with failures is a cornerstone challenge in the design of distributed algo¬ 
rithms. It is desirable that a distributed system continues to operate correctly 
despite a reasonable amount of failures, and hence obtaining fault-tolerance has 
been a fundamental goal in this field. The impact of failures has been studied in 
various models of computation and for various distributed tasks. 

In this paper, we initiate the study of robustness against failures of the task of 
information spreading in the Vertex-Congest model of computation. Information 
spreading requires each node of the network to obtain the information of all 
other nodes. This problem is at the heart of many distributed applications which 
perform global tasks, and thus is a central issue in distributed computing (see, 
e.g., 16 ). The Vertex-Congest model, where in each round, every node generates 
an 0(logn)-sized packet and sends it to all of its neighbours, abstracts the 
behavior of wireless networks that operate on top of an abstract MAC layer 11 
that takes care of collisions. 

The time required for achieving information spreading depends on the struc¬ 
ture of the communication graph. Even without faults, it is clear that having a 
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minimum vertex-cut of size k implies an ft(n/k) lower bound for the running 
time of any algorithm in the above model, and hence our study addresses the 
fc-vertex-connectivity of the graph. The diameter of a graph is a trivial lower 
bound on the number of rounds required for spreading even without faults, and 
hence, for fc-vertex-connected graphs, fi(n/k) is a general lower bound as there 
exist k- vertex-connected graphs of diameter n/k. 

A tempting approach would be to use randomization for choosing which mes¬ 
sage to forward in each round of communication, in the hope that this would be 
naturally robust against failures. However, we show that the uniform random¬ 
ized algorithm is slow on a k- vertex-connected family of graphs, denoted G n> fc, 
which consists of n/k cliques of size k that are connected by perfect matchings, 
requiring f}(n/Vk) rounds. 

Instead, this paper presents an algorithm for spreading information in the 
Vertex-Congest model that uses dynamic probabilities for selecting the messages 
to be sent in each round. We prove that for G n ^, the round complexity of our 
algorithm is almost optimal and that it is highly robust against node failures. 

1.1 Our Contribution 

As explained, our first contribution is proving that the intuitive idea of simply 
choosing at random which message to forward is not efficient. The proof is based 
on the fact that there is an inverse proportion between the number of received 
messages in a node and the probability of a message in that node to be chosen 
and forwarded. The larger the number of messages received in the nodes of a 
clique, the longer it takes for any newly received message to be forwarded to the 
nodes of the next clique. The full proof appears in appendix. 

Theorem 1.1. The uniform random algorithm requires fi{n/Vk) rounds on 
G ni k, in expectation. 

Our main result is an algorithm in which the probabilities for sending mes¬ 
sages in each round are not fixed, but rather change dynamically during the 
execution based on how it evolves. Roughly speaking, the probability of sending 
a message is set according to the number of times it was received, with the goal 
of giving higher probabilities for less popular messages. The key intuition be¬ 
hind this approach is that nodes can take responsibility for forwarding messages 
that they receive few times, while they can assume that messages that have 
been received many times have already been forwarded throughout the network. 
This way, we aim to combine qualities of both random and static approaches, 
obtaining an algorithm that is both fast and robust. 

This basic approach alone turns out to be insufficient. It allows each mes¬ 
sage to be sent fast through multiple paths in the network, but it requires an 
additional mechanism in order to be robust against failures. Our next step is 
to augment our algorithm with some additional rounds of communication that 
allow the paths to change dynamically as the execution unfolds, essentially by¬ 
passing faulty nodes. These shuffle phases provide fault-tolerance while retaining 
the efficiency of the algorithm. We consider a strong failure model, in which links 
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are reliable but nodes fail independently with probability q per round and never 
recover, and prove the following result, which holds with high probability0 


Theorem 


Alg. 2 completes full information spreading on G nt k in O (Jr log J n j 


rounds, for any node failure probability per round q, 0 < q < O ( n lo g 3 n ), w.h.p. 


While our algorithm is general and does not assume any knowledge of the 
topology of the network, showing that it is fast and robust for G n ^k is important 
as this graph is basically a fc-vertex-connected generalization of a simple path. 
This constitutes a first step towards understanding this key question. By making 
minor changes to G n k we can cover additional graphs with same or similar 
analysis. We believe that the same approach works for additional families of 
k- vertex-connected graphs. 


1.2 Additional Background and Related Work 


One approach for disseminating information that was introduced in 1| and has 
been intensively studied (e.g. [5}[9], l^J[l5]) is network coding. Instead of simply 
relaying the packets they receive, the nodes of a network take several packets 
and combine them together for transmission. An example is random linear net¬ 
work coding (R.LNC) presented in 10 . Among its advantages is improving the 
network’s throughput 9 . A conclusion that can be derived from the analysis 
shown in [8j, is that RLNC spreads the information in 0{n/k ) rounds, w.h.p. 

However, network coding requires sending large coefficients, which do not fit 
within the restriction on the packet size that is imposed in the Vertex-Congest 
model. An additional disadvantage is derived from the fact that decoding is done 
by solving a system of linear independent equations of n variables, one variable 
for each of the original messages. Thus, the decoding process requires the recep¬ 
tion of a sufficient number of packets by the node, in order to start reproducing 
the original information. Unfortunately, in most cases, this sufficient number 
of packets equals the number of original messages, which means that decoding 
happens only at the end of the process. This issue has supreme importance in 
applications of broadcasting videos or presentations. For example, when watch¬ 
ing online content, one would prefer displaying the downloaded parts of an image 
immediately on the screen, rather than waiting with an empty screen until the 
image is fully downloaded. 

An almost-optimal algorithm that requires 0(nlogn/fc) rounds with high 
probability has been shown in [2j. It is based on a preprocessing stage which con¬ 
structs vertex-disjoint connected dominating sets (CDSs) which are then used 
in order to route messages in parallel through all the CDSs. However, this algo¬ 
rithm is non-robust for the following reason. In the basic algorithm the failure 
of a single node in a CDS suffices to render the entire structure faulty. This 
sensitivity can be easily fixed by combining 0(polylog(n)) CDSs together into 
well-connected components and sending information redundantly over each CDS 


1 We use the phrase “with high probability” (w.h.p.) to indicate that an event happens 
with probability at least 1 — A f or a constant c > 1. 
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in the component, incurring a cost of only an 0(polylog(n)) factor of slowdown 
in runtime. Nevertheless, the construction itself, of the CDS packings, is highly 
sensitive to failures. It is an important open problem whether CDS packings can 
be constructed under faults. 

Randomized protocols were designed to overcome similar problems of fault- 
tolerance in various settings [6,[7], as they are naturally fault-tolerant. The ap¬ 
proach taken in this paper, of changing the probabilities of sending messages 
according to how the execution evolves such that they are inversely proportional 
to the number of times a message has been received, bears some resemblance and 
borrows ideas from j4], where a fault-tolerant information spreading algorithm 
was designed for gossiping, which is a different model of communication. Apart 
from the high-level intuition, the model of communication and the implementa¬ 
tion and analysis are completely different. 


1.3 Preliminaries 


We assume a network with n nodes that have unique identifiers of O(logn) bits. 
Each node u holds one message, denoted m u . An information spreading algo¬ 
rithm distributes the messages of each node in the network to all other nodes. 

In the Vertex-Congest model, each node knows its neighbours but does not 
know the global graph topology. The execution proceeds in a sequence of syn¬ 
chronous rounds. In each round, every node generates a packet and sends it 
to all of its neighbours. The packet size is bounded by O(logn) bits and can 
encapsulate one message, in addition to some header. 

An n-node graph is said to be k-vertex-connected if the graph resulting from 
deleting any (perhaps empty) set of fewer than k vertices remains connected. In 
this paper we assume that k = w(log 3 n). An equivalent definition 
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is that a 

graph is fc-vertex-connected if for every pair of its vertices it is possible to find k 
vertex-disjoint paths connecting these vertices. 

We consider a strong failure model, in which links are reliable but nodes fail 
independently with probability q per round and never recover. 


2 A Fast Information Spreading Algorithm 

In this section, we describe our basic information spreading algorithm. We em¬ 
phasize that the algorithm does not assume anything about the underlying 
graph, except for a polynomial bound on its size. In particular, the nodes do 
not know the vertex-connectivity of the graph, nor any additional information 
about its topology. Each node u has a set of received messages, whose content 
at the beginning of round t is denoted R u (t). We use cnt u . v {t) to denote the 
number of times a node u has received message m v by the beginning of round t. 
Denote by S u (t) the set of messages sent by node u by the beginning of round t. 
Define B u (t) = R u {t ) — S u (t), the set of messages that are known to node u at 
the beginning of round t, but not yet sent. We refer to B u (t) as a logical vari¬ 
able, whose value changes implicitly according to updates in the actual variables 
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Ru(t) and S u (t). For every node u, we have that S u (0) = 0, R u ( 0) = {m u }, 
cnt U:U ( 0) = 1 , and for each v ^ u,cnt UtV ( 0) = 0. 

We present an algorithm, |Alg. 1[ that consists of two types of phases: a 
random phase and ranking phases (see Fig. 31. Let to be the round number at 
the beginning of the random phase, and let to be the round number after the 
random phase. Let t p be the round number at the beginning of ranking phase p, 
and let t p be the round number after ranking phase p, starting from p = 1. In this 
algorithm, it hold s that t p = f p+1 for every p 1 and to = 1. We will later modify 
this algorithm in Section 4 where we argue about properties that hold in t p 


and fp+i, separately. Denote by B u (t p ) the set of node u at time t p . Unlike B u (t), 
B u (t) is an actual variable that does not implicitly change according to R u (t) 
and S u (t). We assign a value to it at the beginning of every phase, that is, 
B u (t p ) = B u (t p ), and make sure that its content only gets smaller during a 
phase. The parameters a and d are constants that are fixed later, at the end 
of |Section~3} The algorithm runs as follows, where in each round every node 

sends a message and receives messages from all of its neighbors: 

(1) Single round (Round 0): This is the first round of the algorithm, where every 
node u sends the message m u it has. 

(2) Random phase: This is the first phase of the algorithm, which consists of r = 
a log n rounds. In each round t, every node u picks a message to send from 
B u (to) uniformly at random, and removes it from the set. 

(3) Consecutive ranking phases: Each of these phases consists of t' = 8 dr log * 1 2 3 n 
roun ds. At t he beginning of such a phase, each node uses the Ranking Func¬ 


tion (Fig. 1) that defines a probability space over the messages in B u {t p ). In 


each round, every node u picks a message to send from B u (t p ) according to 
the probability space, and removes it from the set. 


Ranking Function. The ranking function (in Fig. 1 ) is calculated by each node, 
and defines a probability space over its messages. Each node u sorts the messages 
in B u according to their cnt values, smallest to largest, breaking ties arbitrarily. 
Denote by rank m the position of the message m within the sorted list, and let 
b = \B U \, be the size of the list. We consider the probability space in which 
the probability for a message m with rank m = r to be picked is Namely, 
the probability is inversely proportional to r. The &-th harmonic number, Hb = 
l/b is a normalization factor (over the whole list of messages). This means 
that messages in lower positions (lower rank m values, implying lower cnt values) 
are more likely to be picked. 

Other interesting variants of probability distributions over the messages might 
work as well. For example, the inverse proportion might be raised to some expo¬ 
nent, and be a function of the cnt values instead of the ranking r. Our ranking 
function was selected as it is very simple, and fits perfectly in |Lemma 2.2| In the 
algorithm, the probability space used by a node u during a phase is calculated at 
the start of the phase. In ranking phases, it is defined according to the Ranking 
function. In the random phase, it is the uniform distribution. Within a phase, the 
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Algorithm 1 for each node u 

1 

SYNCROUND(m u ) 

> Round 0 

2 

RandomPhase() 


3 

loop 


4 

RankingPhase() 


5 

end loop 


SyncRound(m) 

6 

procedure SYNCRoUND(m) 

> A synchronized round 

7 

send(m) 


8 

S u (t) «- S u (t) U {m} 


9 

R <— received messages 


10 

for all m v G R do 


11 

R u {t) t- Ru(t) U {m v } 


12 

cnt U:V (t) g- cnt u ,v{t) + 1 


13 

end for 


14 

t <r- t+ 1 


15 

end procedure 


RandomPhase 

16 

B u (to) <— B u (t) 

> t = to 

17 

loop r times 

1 > t — a log n 

18 

m pop message from B u (to) uniformly at random 


19 

SYNCROUND(m) 


20 

end loop 


RankingPhase p 

21 

B u (t p ) <— B u (t) 

\> t = t p 

22 

Prob <— RANKINGFUNCTION(R„(t p )) 


23 

loop t' times 

> t' — 8dr log 2 n 

24 

m pop message from B u (t p ) according to Prob 


25 

Nullify Prob[m] (update Prob accordingly) 


26 

SYNCROUND(m) 


27 

end loop 



1: function R.ANKlNGFuNCTlON(Buffer B u ) 

2: mList <— sort B u increasingly according to cnt values 

3: b <— length(mList) 

4: for all 1 < r < 6 do Prob[mList[r ]] <— end for 

5: return Prob 

6: end function 


Fig. 1. The Ranking Function 


only modifications in the probability space of a node are done due to the non- 
repetitive sending policy{^] i.e., the need for nullifying probabilities of messages 
that are already sent. When a message is sent, the modification can be done, for 

There is no point in re-sending messages, as all links are reliable. 


2 
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example, by updating the normalization factor, or alternatively by distributing 
the probability of the sent message between all other messages (say, proportion¬ 
ally to their current probabilities). Anyhow, this implies that the probability 
of each message can only get larger during a phase, as long as it is not sent. 
Namely, the initial probability of a message (at the beginning of a phase) is a 
lower bound on its probability for the rest of the phase (as long as it is not sent). 
Probabilities are not defined for messages that were not known at the start of 
a phase, and were first received during the phase, thus these messages have no 
chance of being sent until the next phase starts. 

The Phase Separation Property. Changes in cnt values during a phase (due 
to reception of messages) do not affect the probability space of this phase, as it 
is calculated only at the start of each phase. This implies that messages that are 
first received by a node after the start of the random phase or a ranking phase 
have zero probability for being sent during that phase, and can be sent by the 
node only starting from the next phase, when the probability space is recalcu¬ 
lated. We call this the phase separation property , and it implies the following: 

Proposition 2.1. At the start of ranking phase p, every message has propagated 
to a distance of at most p + 1. 

The following lemma holds for any node and for a general graph. Its proof 
appears in appendix. 

Lemma 2.2. Let m be a message with rank r < 8r (recall that r = a log n), 
then m is sent during the ranking phase with probability at least 1 — n d . 


3 Time Analysis for G n ^ 


Recall that G n> k is the graph that consists of n/k cliques of size k (assume n/k 


is an integer), with a matching between every two consecutive cliques (see Fig. 2 
in appendix). Clearly, G„^k is k- vertex-connected. 


Additional Definitions. Denote by C the set of all cliques. Recall the enumer¬ 
ation of the cliques, and denote by C) clique number i, i £ {1,..., ?}. Denote 
by C(u) the clique that contains node u. A layer L is a set of n/k nodes from all 
distinct cliques that form a path starting in C± and ending in C n /k- We denote 
by C the set of all k layers. The layer L(u ) € £ is the layer that contains node 
u. Notice that within the same clique, different nodes belong to different layers. 

We now analyze the time complexity of the algorithm to spread informa¬ 
tion over G U: k- For simplicity, we analyze the flow of messages from Cj to C*, 
where j < i. The opposite direction of flow and its analysis are symmetric. 


Theorem 3.1. 

rounds, w.h.p. 


Alg. 1 completes full information spreading on G n ^ inO(| log d nj 


The theorem is directly proved based on Lemma 3.2 


as follows. 





Lemma 3.2 (Iteration). For every i, 1 < i < j, every node u £ Ci, and every 
node v such that v £ Cj for some i — p<j<i, it holds that m v £ R u (t p ), w.h.p. 


Proof (Proof oj \ Theorem 3.1 ). Lemma 3.2 shows that by the end of ranking phase 
p, w.h.p. each node u knows all messages m v originating at distance at most p. 
This implies that full information spreading is completed after n/k phases, since 
n/k is the diameter of the graph, which proves Theorem 3.1 □ 


In the rest of the section we prove |Lemma 3.2| The following definition is 
useful to indicate that a node shares responsibility for disseminating a message. 

Definition 3.3 (Fresh message). A fresh message of a node u at time t, is a 
message m v £ R u (t ) for which cnt UiV (t) < T, for threshold T = |r. 

General Idea of the Proof. At the end of round 0, every message m v is 
disseminated in its own clique C(v ). Then, we show that by the end of the random 
phase, each message rn v is sent w.h.p. by a sufficiently large number of nodes 
u £ C(v), to become non-fresh in all nodes of the clique C(v). Simultaneously, 
each of the messages in v becomes known and fresh in a sufficiently large number 
of nodes in the neighboring clique. 

Then we show that ranking phases shift and preserve this situation. At the 
beginning of every ranking phase, every fresh message in a node is also fresh in 
a sufficiently large number of nodes within the same clique. During the phase, 
all of the fresh messages are sent w.h.p., implying that each one of the messages 
(i) is disseminated in the clique; (ii) is not fresh in nodes of the clique anymore; 
and (iii) is fresh in a sufficiently large number of nodes in the neighboring clique. 

The combination of properties |(ii)| and (iii) is the crux of the proof. It guar¬ 
antees that the process progresses iteratively, as it leads to similar conditions 
again and again at the beginning of every new ranking phase. This happens 
because every node can easily distinguish between a new message received from 
nodes within the clique (becomes non-fresh by the end of the phase), and a new 
message received from the neighbor in the neighboring clique (stays fresh at the 
end of the phase, and should be sent during the next phase). We emphasize that 
all of this is done implicitly, without knowing the structure of the network. 

This iterative behavior of the combined properties guarantees that every mes¬ 
sage propagates one additional clique per phase, until full information spreading 
completes after 0{n/k ) phases. 

Let t', for 0 < t' < r—1, be the time from the first round of the random phase, 
i.e., t! = t — to- The following proposition is immediate from the pseudocode: 

Proposition 3.4. At the beginning of the random phase, B u (to) for every node 
u £ Ci contains exactly k— 1 messages m v originating atv £ Ci, and at most two 
additional messages, one originating at v £ Cj_i D L(u), and one originating at 
v £ C i+ 1 nL(u). Thus, it holds that \B u [t 0 +t')\ = k+l—t', fori = 2, 3, • • • , f — 1, 
and \B u (t 0 +t')\ = k - t', for i = 1, g. 

Namely, nodes of inner cliques ( Ci , 1 < i < n/k) start the random phase with 
|B u (to)| = k + 1, while nodes of cliques C± and C n /k start the random phase 
with \B u (t 0 )\ = k. 
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3.1 Analysis of the Random Phase 


The following lemma analyzes the initial random phase, and shows that every 
message rn v is non-fresh in all nodes of C(v) at the end of the random phase: 

Lemma 3.5. At the end of the random phase, for every message m v and for 
all nodes u £ C(v), m v is non-fresh for u, with probability at least 1 — raa/ 4 8 _ 1 . 


Proof. Fi x v. Message m„ is disseminated in C(v) by the start of the random 
phase. By Proposition 3.4 for every u £ C(v), it holds that \B v (to+t')\ < k+l—t' 
during the random phase. 

Let l Ui „, for every u £ C(v), be an indicator variable that indicates whether 
node u sends m v during the random phase or not. Then 


Pr[l„,„ 


r—1 


iisi-n 

t '=0 


k — t' 
k + l — t' 


k + 1 — t 
k + 1 


> 


(3/2 )k ‘ 


Let X v = Y^iueCtv) lu.tu be the number of nodes in C(v) that send m v during 
the random phase, i.e., the number of times m v is received by every node in 
C(v). Then 


H = E{X V ) = E 



= E E( 1„,„)> £ | 

u£C(v) u£C(v) 


2 r 

y 


Since v is fixed, the indicator variables are independent, as they refer to decisions 


of distinct nodes. By applying a Chernoff bound 14 Chapter 4], we get 


Pr[A„ < (1 — S)p] < exp (— S 2 p,/2) < exp (—<5 2 cdog: 




By setting 5 = j, we get that a message m v is non-fresh in all nodes u £ C{v) 

with probability at least 1- 5745 - By a union bound, this holds for every node 

v with probability at least 1 — ^ 735 ^- □ 


Definition 3.6. A pioneer message in node u £ Ci at time t p (beginning of 
ranking phase p), is a message m v £ R u {t p ) that originated at v £ C/_ p _i. 


Pioneer Attributes. If a message m v is a pioneer in node u £ Ci at time t p , then 
(i) v £ L(u) (by Proposition 2.1 the message was transmitted over the shortest 
path), and the following hold at time t p : (ii) cnt UtV (t p ) = 1, and thus m v is fresh 
for u, (iii) m v </ R u ’{t P ) for every v! £ Gi,v! 7 ^ u (by Proposition 2.1 1 , (iv) m v 
is disseminated in C/_ 1 (by the node that relayed m v to its neighbor in C/), and 
(v) m v is fresh in every node u' £ C)_ 1 . The following is proved in appendix. 


Lemma 3.7. With probability at least 1 — l/n “/ 24 1 , at the end of the random 
phase, for every i, the number of pioneer messages that reach Ci is < 3r. 
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3.2 Analysis of Ranking Phases 


After analyzing the single random phase, here we analyze the ranking phases. 

Lemma 3.8. With probability at least 1— , every node u that starts ranking 

phase p with at most 8 t fresh messages, sends all of them during the phase. 


The proof appears in appendix. To prove |Lcmma 3.2 we show a sequence of 
four inductive properties, that hold for ranking phase p, with probability at 
least 1 - 


1748=1, 


Property 1. For every i,l < i < it holds that the number of messages m v , 
v £ Cj_ p _i, such that m v £ R u (fp ) f° r some u £ Ci (pioneers), is at most 3 t, 
and each reaches a distinct node u £ L{y). 


Property 2. For every i, 1 < i < I) , and every node u £ Ci, it holds that at 
time t p there are at most 4r fresh messages m v for node u for every one of the 
two directions of flow (8r in total). All of them originated at nodes v £ Ci- P 
(similarly, v £ Ci+ P ), except for at most one (a pioneer) which originated at 
u' £ Ci-p-itlL(u) (similarly, v! £ Ci +p+ \nL{u)). All messages m v £ R u (t p ),v £ 
Ci_ p (similarly, v £ Ci +p ), are fresh. 

Property 3. For every i,l < i < ? . and every node v £ Ci_ p , it holds that m v 
is fresh for at least T nodes u £ Ci at time t p . Recall that T = r/2. 

Property 4. For every i,l < i < every node u £ Ci, and every node v 
such that v £ Cj for some i ~ p < j < i, it holds that m v £ R u (t p ), and m v is 
non-fresh. 


We prove the four properties simultaneously by induction on the ranking phase 
number, p. To prove the base cases, we assume that all events described in 


Lemma 3.5 Lemma 3.7 and Lemma 3.8 (for p = 1) occur. Notice that, by a 


union bound, the probability for this is at least 1 — I 

1 - („ q /4S-1 + ^= 2 ). 


i/24-1 


i / 48 — 1 


> 


To prove the induction step, we assume that all events described in the four 
properties for p— 1, and in |Lemma 3.8| for p — 1 and p, occur. This happens with 

probability at least 1- ( na /ls-i + + ^=2 + ^ 7 = 2 ) = l-(;— - L 


S748=T T" n d-2 -T ^3=2 -T ^3=2 j — 1_ \„a/ 4 8- 1 

The complete inductive proof appears in appendix. Property 4 guarantees that 
full information spreading is completed after ranking phase p = n/k, with prob¬ 
ability at least 1 - + na/ 4 s-i 


) > 1 - (^=3 + ^ 5 = 2 ) >1-^3, for a 


constant c, by fixing d and a to values d > c + 3, a > 48c + 96. This completes 
the proof of|Lemma 3.2[ from which [Theorem 3TT| follows. 


4 Fault Tolerance 

|Alg. 1| highly depends on the random phase in the following sense. For every 
node v, consider the set of nodes in neighboring cliques that know message m v 
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by the end of the random phase. Then, w.h.p. the algorithm spreads m v using 
the layers of nodes in the above set (“carriers”). This means that the paths of a 
message are fixed very early in the algorithm and do not alternate. 

A single failure of a node in each layer (carrier) is sufficient to break down 
its role. Each message relies on at least T different layers to proceed. Hence, the 
algorithm is sensitive to failures in which less than T carrier layers are non-faulty. 

At the beginning of ranking phase p, consider the case where a message 
m v £ Ci- P is fresh in x < T nodes in clique C), due to failures. The behavior of 
the algorithm in such case is as follows: During the ranking phase, less than T 
nodes in the clique send the message, so all other nodes in Ci receive the message 
less than T times, thus it stays fresh in all of them at the end of ranking phase 
p. Starting from the next ranking phase, the message rn v propagates regularly 
over those x < T carriers, but also propagates over all other carriers, with a 
delay of a phase. This means that every layer becomes responsible for one extra 
message (in addition to at most 8r messages), which may still be tolerable. In 
general, our algorithm can manage a constant number of such occurrences. 

We aim to cope with a larger number of failures, so we modify our algorithm 
to help layers bypass their failing nodes, so they continue operating as carriers. 

4.1 Shuffle Phases 


We invoke a shuffle phase between every two ranking phases, so phases of the 
algorithm now proceed as described in |Fig. 4| Roughly speaking, the objective 
of a shuffle phase, is that nodes of every clique re-divide their responsibilities 
over messages. 

A shuffle phase consists of 8r rounds. During it, every node sends its fresh 
messages (and receives fresh messages from all neighbors). Instead of updating 
the regular cnt values, nodes use separate counters, phasecnt, to count the num¬ 
ber of receptions for each message during the current shuffle phase. Recall that 
the objective is shuffling the fresh messages between nodes of same clique. Thus, 
at the end the of the shuffle phase, every node identifies and filters out unwanted 
messages, which are messages received from neighboring cliques (low phasecnt 
values), and messages that were already non-fresh prior to the start of the shuffle 
phase. Then it randomly picks 4r new fresh messages, to start the next ranking 
phase with. 

The important gain from this cooperative division of responsibilities done by 
the nodes of a clique, is that a node u £ Ci that does not receive new messages 
from its faulty neighbor u' £ C t _i fl L(u ), can overcome the failure of the carrier 
layer, and still take part in transmitting relevant messages from one clique to 
the other, with no delays. The proof of the following appears in appendix. 


Theorem 4.1. 

rounds, w.h.p. 


Alg. 2 completes full information spreading on G n ,k in O (^ log 3 n) 


4.2 Resilience to Faults 

Recall that we consider a model of independent failures of nodes, where each 
node fails at each round with probability q, and never recovers. Let r e < = 
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Algorithm 2 for each node u 

1: SYNCRouND(m„) > Round 0 

2: RandomPhase() 

3: loop 

4: RankingPhaseQ 

5: ShufflePhase() 

6: end loop 

ShuftlePhase p 

7: B u (t p ) ■<— fresh messages in B u (t) > t = t p 

8: for all m v G B u (t p ) do 
9: phasecnt U ' V •<— 1 

10: end for 
11: _R 4— B u (t p ) 

12: loop 8r times 
13: if B u (tp) = 0 then 

14: send own message m u 

15: else 

16: pop and send a fresh message from B u (t p ) 

17: end if 

18: R' receive messages 

19: for all m v G R' do 

20: if m v R then 

21: phasecnt UiV «— 1 

22: else 

23: phasecnt U)V <— phasecnt u , v + 1 

24: end if 

25: R <— R U {m v } 

26: end for 

27: t<-t +1 

28: end loop 

29: R R after filtering out unwanted messages. > Filter out messages m v with 
phasecntUiV < c • T > Filter out messages that were non-fresh prior to the start of 
the phase 

30: Ru(t) 4— R u {t) U R 

31: Select 4r messages from R randomly, rank them from 1 to 4 t. 


O (f log d n) (the round number at the end of ranking phase n/k in Alg. 21. First, 
we prove the following. The proof appears in appendix. 


Lemma 4.2. At the end of round r e , the number of non-faulty nodes in each 
clique is at least (30fc/32), with probability at least 1 — 1/n 30 . 

We show that the algorithm tolerates failures for q, 0 < q < O ( n Io g 3 n ^ ■ 


Theorem 4.3. 


Alg. 2 completes full information spreading on G n ^ in O log 3 nj 


rounds, for any node failure probability per round q, 0 < q < O ( n i Q g 3 n ) > w.h.p. 
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Proof. Fix i,p. Let m v be a message that is fresh in at least T (non-faulty) nodes 
in Ci -1 at the end of shuffle phase p — 1 . Here we analyze the probability that 
m v is not shuffled successfully in clique C/. 

An unsuccessful shuffle might occur either because the phasecnt values in 
Ci at the end of shuffle phase p are smaller than the threshold of T* = cT, so 
the message is filtered out (denote this event by A) , or because the message was 
selected by less than T (non-faulty) nodes. By |Lemma 3.8 at the beginning of 
shuffle phase p, the message m v is supposed to be fresh in at least T nodes in Ci 
(each of them gets the message from its respective neighbor in Cj_i). Of these 
nodes in Ci, if one does not send m v during shuffle phase p, then either the node 
or its neighbor in Cj_i (or both) becomes faulty by the end of shuffle phase p. The 
probability q for such a pair of nodes not to fail is bounded from below (according 
to Bernoulli’s inequality) by q = ((1 — q) Te ) 2 > (1 — gr e ) 2 > 1 — 2 qr e > 1 — 1/16. 

Fix a set of T pairs of nodes S(m v ) C C/_i x C i; of those who know message 
rri v in C,_i at the end of shuffle phase p — 1, and their respective neighbors in 
Ci . There might exist more than T such pairs, but by fixing a set of size T and 
ignoring the rest, we bound the probability of an unsuccessful shuffle from above, 
as the ignored nodes can only help and increase the probability of success. A 
“surviving” pair is a pair of nodes from S(m v ) where both are non-faulty at the 
end of the shuffle phase, and hence function properly (by sending message m v ) 
during shuffle phase p. Denote by s, the number of “surviving” pairs. We have: 


t —1 


5=0 


T 


T —1 


«< E , ?(i-4) T -< E j(w) t -< E 


5 — 0 


T 


T —1 


s—0 


T-s 


We sum over all s £ {0,... ,T* — 1}, where the number of “survivors” is lower 
than the threshold of cT, which implies that the message m v is filtered out, 
improperly, at the end of the shuffle phase due to a low phasecnt value. 

By setting 0 < c < ^, we get that Pr[A] < l/n“/ 3_1 (see calculation in 
appendix). Namely, the message is not filtered out with probability at least 
l/n Q / 3-1 . The number of non- faulty nodes in each clique is at least 31fc/32 with 
probability at least 1 — -|o, by 


proof of Lemma 6.3 (with 5 = 


Lemma 4.2 


An analysis similar to the one in the 
.1/15) gives that, once the message is not filtered 
out, it is selected by at least T of the non-faulty nodes in Ci with probability 
at least 1 — 1 /?r n "“/( 15 ' 16 ). In total, by using a union bound, a message is not 
shuffled successfully between two consecutive shuffle phases with probability at 
most + nll 2 Q/ 1 (15 . 16) + (for value of a fixed earlier). 

We use union bound two more times, for all messages and for all phases, 
and get an upper bound for the probability that a message is not propagated 
properly, of ^. This proves that the algorithm tolerates failures that occur with 
probability 0 < q < in the given model, with probability at least 1 —□ 


5 Discussion 

Static-Routes Algorithms. Let ALG be an algorithm that spreads informa¬ 
tion on k- vertex-connected graphs in O (^ • polylog(n)) rounds, by constructing 
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static routes, and using them to disseminate messages in parallel, each message 
on a specific route. This makes ALG very sensitive to failures, as a single failure 
in a route suffices to render the entire route faulty. 

However, it can easily be configured so that vertex-disjoint routes are com¬ 
bined into groups of size 7 , and every node duplicates its messages and sends 
them concurrently over these components. Notice that in fc-vertex-connected 
graphs, 7 is bounded from above by k. This costs 7 slowdown in runtime as a 
trade-off. Denote this configuration of the algorithm by ALG( 7 ). 

We are interested in cases where 7 = 0(polylog(n)), so that the runtime of 
the algorithm remains O • polylog(?r)). Every combination of 7 vertex-disjoint 
routes induces a 7 -vertex-connected subgraph, as it stays connected after the 
removal of any 7 — 1 vertices. Each component functions as long as it stays 
connected. According to [ 3 J Theorem 1.5], for 7 = l?(log 3 n), such a component 
stays connected w.h.p. if its nodes are sampled independently with a constant 
probability. By considering the sampling process imposed by failures, i.e. con¬ 
sidering the non-faulty nodes as sampled, then each component stays connected 
if a constant fraction of its nodes stays non-faulty during the execution, toler¬ 
ating a constant fraction of nodes that fail. The additional slowdown factor for 
each message to spread over such a component in the presence of faults can be 
loosely bounded form above by 0(7), as the size of the combined component is 
0(7) the size of its original routes, (in the worst case a message traverses over 
all non-faulty nodes of the component). In total, this configuration of the algo¬ 
rithm tolerates the failure of a constant fraction of nodes during its execution, 
which matches a probability of failure of q = O ( n . P oiyiog(n) ) P er roun d, while 

preserving a time complexity of O • polylog(n)). 

The algorithm presented in [2] is static-route, as it constructs CDS packings 
and routes messages over them. The CDS packings are only fractionally vertex- 
disjoint, which requires a few modifications to the above analysis. However, de¬ 
spite the above fix, the algorithm remains vulnerable due to the preprocessing 
stage. Tolerating failures that occur during the preprocessing stage is more com¬ 
plicated, and the construction of CDS packings in the presence of failures is still 
an open problem. 

Summary. In this paper, we show an information spreading algorithm, and 
prove that it is fast and robust for G n ^k- The intriguing open question is whether 
this approach can work for general fc-vertex-connected graphs. 

To summarize, we find the question of devising a fast and robust information 
spreading algorithm in the Vertex-Congest model an intriguing open question, 
and view our result as a first step in this direction. The technique our algorithm 
leverages, of using probability distributions that change over time according to 
how the execution unfolds, may have applications in other settings as well. 
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6 Appendix 

6.1 The Graph G„,fc 



Fig. 2. G n ,k is an example of a k- vertex-connected graph with diameter ?. 


6.2 The Uniform Random Algorithm 

We consider the uniform random algorithm, in which every node picks and sends 
a message from its buffer in each round uniformly at random. We show that 
the time complexity of the algorithm is asymptotically much slower than the 
optimal f](n/k). Consider the uniform random algorithm running on graph G n> fc. 
We prove that the expected number of rounds for full information spreading is 
f2(n/\/k). First, we prove the following. 

Lemma 6.1. If the buffer size of a node is at least n/ 4, then the number of 
rounds needed for a message m v in its buffer to be sent is 0{n) in expectation. 

Proof. During the first n/8 rounds, the buffer size is at least n/4 — n/8 = nj 8. 
The number of rounds until the message m v is first sent is bounded from below by 
a geometric random variable with success probability p = 8/n. The expectation 
of a geometric random variable is 1/p = n/8 , and the lemma follows. □ 

Theorem 11.11 The uniform random algorithm requires fl(n/\fk) rounds on 
G Ut k, in expectation. 

Proof. To prove the theorem, we define a partition over the whole space, and 
calculate the conditional expectations of the number of rounds for each case. We 
show that the expected number of rounds in every case is fi(n/Vk ) 1 and the 
theorem follows according to the law of total expectation: 

E[X I A,;] Pr[Aj] , 
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where {A,;} is the partition. 

Let ro be the random variable of the first round number in which the buffer 
size of all nodes in clique C n / k is at least n/2. The buffer of every node consists 
of the messages it knows but not sent yet. In the executions in which such 
round does not exist, it holds that n — t < n/2, where t is the last round of the 
dissemination process, implying t > n/2 > n/yfk. 

Otherwise, ro is well defined, and it holds that 




n 

32 yfk 


( 1 ) 


Consider the set of messages Mi = {m v \ m v is known to some node u € C n / k }. 
We analyze two possible cases: 


1. If |Mi| < n at ro, then there exists a message that is not known to any 
node in C n / k at round r 0 . Let rn v be such a message. Let ri be the random 
variable of the number of rounds since ro until the message m v spreads to 
all nodes of C n / k . We argue that E[r 0 + ri] > The following trivially 

hold, since ro and ri are non-negative: 


n n 

B h + ri ro< CF n -54 


To conclude the argument for this case, it is enough to show that 


n n n 

E r +ri r ° < 7 = k' ri< m-^v% 


In the following, we assume that ro < n/yfk and ri < n/24, and give a lower 
bound for £ ri ro < ri < ^ . In addition, we assume that k < n/ 6 . 
At round r 0 it holds that at least n/2 — k>n/3 messages are disseminated 
in C r i/k-i, for otherwise the messages do not reach nodes of C n / k . During 
the ri rounds in the interval [ro, ro + ri], all buffers in all nodes in C n / k are of 
size at least n /2 —ri, in all nodes in C n / k - 1 are of size at least n/3 —ri, which 
means that all buffer sizes of nodes both cliques are at least n/4 during the ri 
rounds in the interval. Since buffer sizes are at least n/4 and at most n, the 
probability q that a node v € C n / k _iL)C n / k that knows m v sends it is 1/n < 
q < 4/n. Let X r be the number of nodes in C n / k -i that send rn v during 

the r rounds that follow round r 0 . It holds that E X r r 0 < H 

is at most fc( 1 — (1 — q) r ) < k(l — (1 — qr)) = kqr < ^ (the first inequality 
is according to Bernoulli’s inequality). Each node in C n / k -i that sends the 
message relays it to its corresponding neighbor in C n / k . If any of these nodes 
in C n /k sends m v , then the message is disseminated and all k nodes in C n / k 
know it. 

For every r > n/k , denote by A r the event X r < —. By applying Markov’s 
inequality we get that Pr I r > ^ r 0 < 
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and hence, 


Pr 


n n 

r ° < 7k ,ri < 24 


1 

> - . 
“ 2 


Under the assumption that A r occurs, the probability for the dissemina¬ 
tion to occur during these r rounds (implying that r\ < r) is at most 
1 — ((1 — qy') 8kr / n = 1 — (1 — q)skA/n < l — (l — 8 kr 2 q/n) = 8 kr 2 q/n < 
32k 2 , (first inequality is according to Bernoulli’s inequality). By assigning 
r = > n/k, we get that 


Pr 

and hence 
Pr 


n 

n n 

r i > — 1 = 

8 y/k 

A r,ro< 7S ,r,< -j 


> 1 32k(n/8y/k) 


1 

“ 2 ’ 


n 

n n 

r i > — 1 = 

8 y/k 

ro< Tk' ri< u\ 


r i > 


> Pr 

= Pr 

1 1 1 


8 Vk 


, A r 


> 


n n 

r0< Tk' ri< 24 . 


n 

n n 



n n 

r 1 > — 7 = 

8 y/k 

A "’ n < Tk' ri < Ti\ 

■ Pr 

A r 



> 


> 


2 2 4 ’ 


the first equality is according to the law of conditional probability, P(^4 n 
B ) = P(A\B)P(B). This gives 


E 


r i 


n n 

T ° < 7^ ,n K 24 . 


> 


n 


32 Vk ’ 


which proves ([3|. 

2. If M 1 1 = n at rg, let r 2 be the random variable of the number of rounds since 
ro until all messages are disseminated in C n /k- We argue that E[ro + ?" 2 ] > 
:i ‘^ k ■ Since rp and r 2 are non-negative, the following hold trivially: 


E 



n n 

ro +r 2 

r0< 7I’ r2 " 24. 


> 


n 


32 Vk 


To conclude the argument for this case, it is enough to show that 


E 



n n 

ro +r 2 

r 0 < 7P r2< 24. 


> 


32 Vk 


(4) 


(5) 


In the following, we assume that ro < n/y/k and r 2 < n/ 24, and give a lower 

. In addition, we assume that k < n/ 6 . 


bound for E 


T 2 


r °<7k’ r *<% 


In each round, a node in C n /k receives at most k new messages and sends 
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one, and hence the buffer size can increase by at most k — 1 in a single 
round. By its definition, at round ro there exists a node in C n / k with buffer 
size at most n/2 + k — 1 < 2n/3, implying that the number of disseminated 
messages in C n / k is at most n/2 + ro + k — 1 < n/ 4. At round r 0 , at least 
n/ 4 messages are not disseminated in C n / k but are known to some nodes 
of the clique. Denote the set of these messages by M 2 . Messages in M 2 
were not received from nodes within the clique C n / k (otherwise, there are 
disseminated), which means that at round ro every node in C n / k knows 
at most ro < n/yfk such messages. During the r 2 rounds in the interval 
[ro, ro + r 2 ], all buffers in all nodes in C n / k are of size at least n/2 — r 2 > 
n/2 — n/y/k — n/2A > n/4. Since buffer sizes are at least n/4, the probability 
q for each node in C n / k to send a message from M 2 in a single round is at 
most ffml\ = In order for the dissemination process to complete, each 
message in M 2 must be sent at least once by some node in C n / k (or be sent by 
all nodes of C n / k _i , which happens only after f2(n) rounds in expectation, 
by Lemma 6.1). By considering the sending of a message from M 2 a success, 


which occurs with probability at most q, the dissemination process completes 
after at least \M 2 \ > n/4 successes. Denote by X the random variable of 
number of trials before reaching \M 2 \ successes. A is a negative binomial 
variable, X ~ NB(\M 2 \,q), with expectation of \M 2 \/q > j/^= = nyfk 
trials. In each round, the number of trials is k (one trial per node of the 
clique), and hence, the expected number r 2 of additional rounds before all 
messages are disseminated in C n / k is at least ny/k/k = ^ in expectation. 
We get that 


E 


T2 


r o < 


yfk 


?’2 < 


n 

24 


> 


n 

yfk 


which proves <§• 


In summary, we covered the whole space by combinations of events that form 
a partition, proved that the conditional expectation in each case is f 2 (n/y/k ), 
and hence by the law of total expectation, the theorem follows. □ 


6.3 Missing Proofs 



Fig. 3. Phases of |Alg. l] 
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Lemma 12.21 Let m be a message with rank r < 8 r (recall that t = a log n), 
then m is sent during the ranking phase with probability at least 1 — n~ d . 


Proof. Let A be the event that the message with rank r is not picked during a 
phase of t' = 8dr log 2 n = 8 da log 3 n rounds. We wish to bound from above the 
probability for event A: 


Pr [A] < 1 - 


< 1 - 


r ■ H b 
1 


< 1 - 


r • (In b + 1 ) 


< 1 - 


1 


8 dot log 3 n 


<,i 


r log n , 

-8 da log 2 n 


< 1 - 


r logn 


r • log b 

(r log ro) / 8 da log 2 n 


< 


< 


-8 da log n 


The second inequality holds because H n < ln(n) + 1. The last inequality holds 
since (1 — l/x) x < e _1 < 1/2 for x > 0. Namely, any message with r < 8 r = 
8 a log n is sent during the phase with probability at least 1 - \. □ 


Lemma 13.71 With probability at least 1 — l/n “/ 24 1 , at the end of the random 
phase, for every i, the number of pioneer messages that reach Ci is < 3r. 


Proof. According to pioneer definition, considering the direction of the flow of 
messages, cliques C\ and C 2 could not have pioneer messages. Fix i, 3 < i < n/k. 
By [Proposition 3.4[ at the beginning of the random phase, for every node u £ 
1 , buffer B u (tg) contains exactly one unique message m v ,v £ C /_2 D L(u), 
and it holds that \B u (to +1 ')\ = k + l — t 1 during the random phase (as C/_i is 
an inner clique). 

Let 1„, for every u £ Ci- 1 , be an indicator variable that indicates whether 
node u sends its unique message during the random phase, or not. Then 

Pr[l u = 1] = 1 
= 1 

Let Xi -1 = Y^ueC - be the number of messages m v ,v £ Ci- 2 , that reach 
clicjue Ci by the end of the random phase. Then 


T— 1 


pr[i u =o]= 1 - n 

k + 1 — r 


t '=0 


k — t' 
k + l — t' 


k +1 


= 1 - 1 - 


k + 1) k + 1 


p = E(X l _ 1 ) = E 



= E 

ueCi -1 


k + 1 


E E ( 1 u) = 

ueCi -1 

r 

k + 1 ’ 


= k ■ 
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which means that r/2 < p < r. The indicator variables are independent, as they 
refer to decisions of distinct nodes. By applying a Chernoff bound, we get 


S exp 


< exp 


' — S 2 ■ alogn' 


1 

< aS*' 

n « 


By setting S = we get that the number of pioneer messages, i, that 

reach C, from one direction is < (3/2) r with probability at least 1- 5754 - By 

a union bound, this holds for both directions and every clique with probability 
at least 1 - ■ □ 


Lemma 13.81 With probability at least 1— }_ 2 , every node u that starts ranking 
phase p with at most 8 r fresh messages, sends all of them during the phase. 


Proof. Fix a node u. All fresh messages m v £ R u {t) have rank r < 8 r. According 
to |Lcmma 2.2[ a message with rank r < 8 r is sent during a ranking phase with 
probability at least 1 —\. By a union bound, the probability for node u to send 
all of its fresh messages during the phase is bounded by 1 — • 8 r > 1 — —£=?■ 

We use a union bound once more to bound the probability that this happens for 
every node u by 1 - ^=r ■ n = 1 - ^=2 ■ □ 


Property 1. For every i, 1 < i < it holds that the number of messages m v , 
v £ Ct-p- 1 , such that m v £ R u (t p ) for some u £ C) (pioneers), is at most 3r, 
and each reaches a distinct node u £ L(v). 


Property 2. For every i, 1 < i < and every node u £ C», it holds that at 
time t p there are at most 4r fresh messages m v for node u for every one of the 
two directions of flow ( 8 r in total). All of them originated at nodes v £ C,;_ p 
(similarly, v £ Ci+ P ), except for at most one (a pioneer) which originated at 
v! £ Ci-p-inL(u) (similarly, v! £ C7j+ p +inL(u)). All messages m v £ R u (t p ), v £ 
Ci_ p (similarly, v £ Ci+ P ), are fresh. 


Property 3. For every i, 1 < i < j, and every node v £ Ci- P , it holds that rn v 
is fresh for at least T nodes u £ C* at time t p . Recall that T = |r. 


Property 4. For every i, 1 < i < every node u £ 6 /, and every node v 
such that v £ 6 / for some i — p < j < i, it holds that m v £ R u (t p ), and m v is 
non-fresh. 
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We prove the four properties simultaneously by induction on the ranking phase 
number, p. To prove the base cases, we assume that all events described in 


Lemma 3.5 Lemma 3.7 and Lemma 3.8 (for p = 1) occur. Notice that, by a 
union bound, the probability for this is at least 1 — ( a/ 2 4 -i + n c/\a-i + ^ 7 = 2 ) > 
1 - ' 


n/48-1 


Proof (Base case for Property 1). Let p = 1. One random phase precedes the first 
ranking phase. The upper bound on the number of pioneers in every clique holds 
according to |Lennna 3.7 The distribution among distinct layers is immediate 
according to Attribute (i) of pioneer messages. □ 


Proof (Base case for Property 2). Let p = 1. Fix some node u £ Ci. We analyze 
possibilities for fresh messages for one direction of flow at the end of the random 
phase, and the other direction is symmetric. By [Proposition 2.1| messages rn v £ 
Ru(t 1 ) originate at nodes v £ Ci- 2 U Cj_i UC,. A message m v £ R u (ti) that 
originates at node v £ Ci- 2 is a pioneer. By Attributes (i) and (ii) there can 
be at most one such message, and it is fresh. For messages m v £ R u (ti) that 
originate at nodes v £ Ci-\ there are two possibilities. One possibility is that 
they are received from the neighbor u' £ Ci D L(v), which implies that they are 
pioneers in nodes Ui £ Ci + \ C\L(v) at time t\. By Property 1 for p = 1 (which is 
already proved), there are at most 3r such messages. The only other possibility 
is that they are received from the neighbor v! £ Ci -1 D L(u). There are at most 
r such messages (which might include one that originates at Ci- 2 , as already 
discussed), and they are all fresh. Messages m v £ R u {t 1 ) that originate at nodes 
v £ Ci are all non-fresh, according to |Lemma 3.5| 

In total, at the beginning of the first ranking phase, each node u has at 
most 4t fresh messages from the one direction. All of them originated at nodes 
u' £ Ci- 1 , except for at most one which originated at u' £ C \- 2 D L(u). All 
messages that originated at nodes v! £ Ci-± are fresh. The other direction of 
flow is symmetric. □ 

Proof (Base case for Property 3). Let p = 1. For every v £ CWi, at the end 
of round 0, exactly one node u £ Ci knows m v . It may disseminate it during 
the random phase. At the end of the random phase, by |Lcmma 3.5 for every 
v £ Ci- 1 , m v is non-fresh in all nodes of Ci- 1 . That is, by the end of the random 
phase, every node v' £ Ci- 1 , 1 / 7 ^ v, receives m v at least T times, all from nodes 
within the clique. Therefore, at least T nodes in Ci -1 send m v in the random 
phase, which implies that at least T nodes in Ci know m v . According to the 
phase separation property, every such node in Ci receives m v at most twice 
(from the neighbor in Ci- 1 , and possibly from the neighbor u £ Cf), so it is 
fresh. □ 


Proof (Base case for Property f). Let p = 1. Fix i, u £ Ci. According to 
|Lcmma 3.5[ it holds that for every node v £ Ci, m v is known and non-fresh 
in u. 

At the beginning of the first ranking phase, according to Property 3 for p = 1 
and i, it holds that every message m v ,v £ Ci- 1 , is fresh in at least T nodes in 
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Ci . According to Property 2 for p = 1, it holds that every node has at most 8 r 
fresh messages. By Lemma 3.8 all nodes (in particular, nodes in Cj) send all of 
their fresh messages. This means that every message m v ,v £ Ci -is received 
by node u at least T times so it becomes non-fresh. □ 


This completes the proof of the base cases. Recall that the base cases are proved 
by assuming that all events described in Lemma 3.5 |Lemma 3.7[ and |Lemma 3.8 
(for p = 1 ) occur. Thus, the properties are proved for p = 1 with probability at 
least 1 - (^, 3 T + J=?)- 

To prove the induction step, we assume that all events described in the four 
properties for p— 1, and in |Lemma 3.8| for p—1 and p , occur. This happens with 

probability at least l- ^/L-i + + ^2) = l-( n „/4 8 -i + p^ir)- 

Proof (Induction step for Property 1). By Property 1 for p — 1 and i — 1, at the 
beginning of ranking phase p—1, the number of messages m„, v £ Ci- P - 1 , that 
reach nodes in Ci -1 is at most 3r, each reaches a distinct node u £ Ci -1 D L(v). 
At time t p - 1 , by Pioneer Attribute (ii) each one of them is fresh. By Property 2 


for p— 1 and i— 1, at the beginning of ranking phase p—1, every node u £ C,- 1 
has at most 8 t fresh messages. By |Lemma 3.8| for p — 1, every node sends all of 
its fresh messages during ranking phase p—1 (in particular, pioneer messages 
in nodes in Ci- 1 ). Thus, it holds that the number of messages m v , v £ Ci- P - 1 , 
such that m v is a pioneer at time t p in nodes of Ci, is at most 3r, and each 
reaches a distinct node u £ Ci fl L{v). □ 


Proof (Induction step for Property 2). Fix a node u £ Ci. By Proposition 2.1 
for every message m v £ R u (t p ) (known to u at the beginning of ranking phase 
p) it holds that v £ (J Cj. By Property 4 for p—1 and i, for every 

node v such that v £ Cj for some i — p+1 < j < i, it holds that m v £ R u (t p - 1 ), 
m v non-fresh. Thus, only messages m v ,v £ Ci- V - 1 U Ci- P can be fresh. 

Consider a message m v ,v £ Ci- p \ By Property 1 for p — 1 and i, at the 
beginning of ranking phase p—1, any message m v ,v £ Ci- p , that reach Cj (a 
pioneer) is known to exactly one node in the clique. Thus, any message m v ,v £ 
Ci- P , that reaches Cj by the beginning of ranking phase p is fresh (because it 
could be received only once from a neighbor within the clique Cj and once from 
a neighbor in clique Cj_i, i.e., it is received at most twice). 

By Property 2 for p—1 and i — 1, at the beginning of ranking phase p — 1, 
node v! £ Cj-\ C\L(u) has at most 4r fresh messages (consider relevant direction 
of flow), all of them originated at nodes v £ Ci- P , except for at most one which 


originated at u' £ Cj_ p _i fl L(u) (a pioneer). According to Lemma 3.8 for p—1, 


every node u' sends all of its fresh messages during ranking phase p—1. Thus, at 
the end of ranking phase p —1 (beginning of ranking phase p), they all reach u, 
and they are all fresh. In particular, they are received at most twice, according 
to the previous discussion. The opposite direction of flow is symmetric. This 
completes the proof. □ 


Proof (Induction step for Property 3). By Property 3 for p—1 and i — 1, at the 
beginning of ranking phase p—1, every message m v , v £ Ci- P , is fresh in at 
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least T nodes v! £ Ci-\. By Property 2 for p — 1 and i — 1, at the beginning 
of ranking phase p — 1 , every node u £ Ci-\ has at most 8 r fresh messages. 
According to |Lcmma 3.8| for p — 1 , all are sent during ranking phase p — 1, each 
of the nodes u £ Ci-i sends to a distinct neighbor node u £ Ci. Therefore, at 
the end of ranking phase p — 1 (beginning of ranking phase p ), every message 
m v , v £ Ci-p, is known to at least T nodes u £ Ci. By Property 2 for p (which 
is already proved) and i , all messages m v , v £ Ci- P known in Ci are fresh, which 
completes the proof. □ 

Proof (Induction step for Property 4). By Property 4 for p — 1 and i, for every 
node u £ Ci, every node v such that v £ Cj for some i — p + 1 < j < i , it 
holds that m v £ R u (t p -\), and m v is non-fresh. This holds also at the end of 
ranking phase p. We still need to show that the property holds for all message 
m v , v £ Ci- p . Notice that Properties 1,2 and 3 are already proved for p. 

By Property 3 for p and i, at the beginning of ranking phase p, every message 
m v , v £ Ci-p, is fresh in at least T nodes u £ Ci. By Property 2 for p and i, at the 
beginning of ranking phase p, every node u £ Ci has at most 8 r fresh messages. 
By |Lemma 3.8| for p, all are sent during ranking phase p. This means that every 
message m v , v £ Ci- p , is sent by at least T nodes of the clique Ci. This implies 
that every message m v ,v £ Ci- P , is received by every node u £ Ci at least T 
times. Thus, at the end of ranking phase p, every message m v , v £ Ci- P is known 
and non-fresh in all nodes u £ Ci, which completes the proof. □ 

Property 4 guarantees that full information spreading is completed after 
ranking phase p = n/k, with probability at least 1 — + na / 48 -i ) > 1 — 

(^3 + ^ a /L- 2 ) > 1 — for a constant c, by fixing d and a to values d > 
c + 3, a > 48c + 96. This completes the proof of |Lemma 3.2[ from which [Theo-| 
Ircm 3.11 follows. 

Theorem 14.11 

rounds, w.h.p. 

Proving the four properties for the modified algorithm implies |Lcmma 3.2| 
from which |Theorem 4.3| follows. In the previous analysis, the transition from 
the end of a ranking phase to the beginning of the next one was immediate, 
therefore claims that hold at end of ranking phase p — 1, automatically hold at 
the beginning of ranking phase p. Here, every two consecutive ranking phases are 
separated by a shuffle phase, implying that t p - \ and t p are not equal anymore. 
We need to prove that the relevant claims that hold at the beginning of a shuffle 
phase (end of a ranking phase) hold also at the end of the shuffle phase (beginning 
of the next ranking phase). That is, we prove that shuffle phases preserve the 
required properties. The addition of the shuffle phase does not affect the progress 
of the algorithm until the end of the first ranking phase. Thus, the base case in 
the inductive proof of the four properties stays as is. Modifications are needed 
to the proofs of inductive steps. 

Before heading to modify the proof of the induction step, we first prove the 
following. 


Alg. 2 completes full information spreading on G nt k in O log n) 
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Lemma 6.2. Assume properties 2, 3 and 4 hold at the end of ranking phase 
p — 1. Then, for each node u £ Ci, and for each direction of flow, at the end 
of shuffle phase p — 1, there are k remaining messages m v ,v £ C)_ p (similarly 
Ci+p) in R after filtering out unwanted messages (in line 29). 


Proof. Properties 2 and 3 hold at the end of ranking phase p — 1, i.e., at the 
beginning of shuffle phase p — 1, for every node v € Ci- P , it holds that m v 
is fresh in at least T nodes u £ Ci, and that every node in Ci has at most 
4 t fresh messages per direction. Thus, during the shuffle phase, every message 
m v , v £ Ci-p, is sent (and thus, received) at least T times by nodes of Ci, and 
therefore is not filtered out at the end of the shuffle phase. As already discussed, 
messages that originate at m v ,v £ Ci_ p _i are filtered out due to low phasecnt 
values. By property 4 for end of ranking phase p— 1, for every node v such that 
v G Cj for some * — p+1 < j < i, it holds that m v non-fresh, so they are filtered 
out. In total, all messages m v , v G C)_ p , are not filtered out, and only them. The 
other direction of flow is symmetric. □ 


Lemma 6.3. Assume properties 2, 3 and 4 hold at the end of ranking phase 

p — 1. Then, with probability at least 1- 9c , / 1 16 _ 1 > a t the end of shuffle phase 

p— 1, every message that is not filtered out in node u G Ci, is selected to be fresh 
by at least T nodes in Ci. 


Proof. Assume properties 2, 3 and 4 hold at the end of ranking phase p — 1. 
Fix i,v. Let 1 U; „, for every u G Ci, be indicator variables that indicate whether 
node u selects m v at the end of shuffle phase p— 1, or not. By |Lemma 6.2[ there 
are at most 2k remaining messages in R after filtering out unwanted messages 
(in line 29). Thus, the probability for each message to be within the 4r selected 
messages at the end of the shuffle phase is at least 


— 1 ] > 


2 T 
~k 


Let X v = Y2ueC- vi be the number of nodes in Ci that select message m v at 
the end of shuffle phase p — 1. Then 


9 = 



E ^) > 

u£Ci 


> 


E 

ueCi 


2 T , 2 T 

~k ~ " ~k 


= 2 T . 


The indicator variables are independent, as they refer to decisions of distinct 
nodes. By applying a Chernoff bound, we get 

Pr[X„ < (1 - S)p] < exp (-<> 2 0 < exp = 

= exp (—<5 2 a log n) < ^ . 
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By setting S = f, we get that a message m v is selected fresh in at least T nodes 
u £ Ci with probability at least 1 — . By a union bound, this holds for 

every node v with probability at least 1 — ra9e , / 1 i 6 _i • □ 


To match the modification of the algorithm, we show that the four properties 
now hold for p with probability at least 1 — ( a /i a _ 1 H—5§s H— 9 <,fie-i ) • To prove 
the new induction step, we make similar assumptions as earlier when proving the 
induction step, i.e., all events described in the four properties for p — 1, and in 


Lemma 3.8 for p — 1 and p, occur. In addition, we assume that events described 


Lemma 6.3|for p — 1, occur. In total, this happens with probability at least 


1 - 

1 - 


2(P-1) 


P 


n a/48-l 

2 


rv 


k/48—1 


n 

2 V 
n d— 2 


d -2 


+ 


n 9a/16-l 

P 


+ 


i -2 


2 


,9a/16-l 


Proof (Extension of induction step for property 1). The property holds at the 
end of ranking phase p—1. At the beginning of shuffle phase p— 1, each pioneer 
message in a clique is known to exactly one node in the clique. Thus, at the 
end of the shuffle phase, the phasecnt values for pioneer messages are at most 2 
(one reception is from the respective node within the same clique, and the other 
is from the neighbor from the neighboring clique). In conclusion, all pioneer 
messages are filtered out, so there are no pioneer messages at the beginning of 
ranking phase p, which completes the proof. □ 

Proof (Extension of induction step for property 2). The property holds at the 
end of ranking phase p — 1. At the beginning of shuffle phase p—1, considering 
one direction of flow, all fresh messages m v in nodes of clique Ci originate at 
nodes Ci_ p , except for pioneers (originating at nodes in Ct- V - 1 ). At the end of 
shuffle phase p—1, as already discussed, all pioneer messages are filtered out 
due to low phasecnt values. By property 4 for the end of ranking phase p—1, 
all messages m v ^ Ci- P are non-fresh, so they are filtered out (if any) for being 
non-fresh prior to the start of shuffle phase p—1. Thus, in total, considering 
both directions, at the end of shuffle phase p—1, each node selects 4r of the 
messages m v ,v £ Ci- P U Ci+ P , marks them fresh and ranks them 1 to At. This 
completes the proof. □ 


Proof (Extension of induction step for property 3). Properties 2 and 3 hold at 
the end of ranking phase p—1, i.e., at the beginning of shuffle phase p—1, for 
every node v £ Ct - P , it holds that m v is fresh in at least T nodes u £ Ci, and 
that every node in Ci has at most At fresh messages per direction. During the 
shuffle phase, every message m v ,v £ Ci- P , is sent at least T times by nodes of 
Ci, and therefore is not filtered out at the end of the shuffle phase. By |Lemma 6.3| 
for p—1, each message is selected and becomes fresh in at least T nodes, which 
completes the proof. □ 


Proof (Extension of induction step for property f). The original proof of property 
4 for p shown in the previous section relies on property 4 at the end of ranking 
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phase p — 1, on Properties 2 and 3 at the beginning of ranking phase p, and on 
|Lemma 3.8| for p. At this point, all of them are proved. Thus, the same original 
proof for property 4 applies directly. 

In other words, by property 4 for p — 1 and i, for every node u £ Ci, every 
node v such that v £ Cj for some i —p+ 1 < j < i, it holds that m v £ R u {i p _ i), 
and m v is non-fresh. Notice that shuffle phases preserve this. By property 3 for 
p and i, at the beginning of ranking phase p, every message m v , v £ Ci- P , is 
fresh in at least T nodes u £ Ci. By property 2 for p and i, at the beginning of 
ranking phase p, every node u £ Ci has at most 8r fresh messages. By |Lemma 3.8 
for p , all are sent during ranking phase p. This means that every message m v , 
v £ Ci_ p , is sent by at least T nodes of the clique Ci. This implies that every 
message m v ,v £ Ci- P , is received by every node u £ Ci at least T times. Thus, 
at the end of ranking phase p , every message m v , v £ Ci- P is known and non¬ 
fresh in all nodes u £ Ci, which completes the proof. □ 


This completes the proof. Recall that we assumed that all events described 
in the four properties for p— 1, in |Lemma 3.8| for p— 1 and p, and in |Lemma 6.3| 
for p — 1, occur. Thus, the properties are proved with probability at least 1 — 

( _ 2 _ , _2p_ , 

Vn“/ 4S — 1 ^ n d ~ 2 ~ 


, 90 / 16-1 , 


Assigning p = n/k in Property 4 proves Lemma 3.2 from which Theorem 4.3 
follows, with probability at least 


2 2 n/k n/k \ ^ 

n a/48-l n d -2 n 9a/16-l J — 

f 1 1 1 

l n d- 3 n ct/48-2 n 9a/16-2 

for a constant c, by fixing d and a to values d > c + 3, a > 48c + 96. 






Fig. 4. Phases of |Alg. 2] 


Lemma 14.21 At the end of round r e , the number of non-faulty nodes in each 
clique is at least (30fc/32), with probability at least 1 — 1/n 30 . 
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Proof. Let l n . for every node u, be an indicator variable that indicates whether 
node u is non-faulty after r e rounds, or not. Then 


Pr[l„ = !] = (! — q) T ' > 1 - qr e > 1 - 1/32 = 31/32 . 


Let Xi = f° r every i, 1 < i < n/k , be the number of non-faulty nodes 

in Ci after r e rounds. Then 


A* = 



E ^ 

nGCi 


> E 31 / 32 = 31^/32 . 

uGCi 


The indicator variables are independent, as failure events of nodes are indepen¬ 
dent. By applying a Chernoff bound, with S = 2-, we get 


Pr 

IV 30 ,1 

Xi < —k 

< Pr 


32 



< exp S' 

1 

< 


, 2 31fc 


2 • 32 


32 J - L " J 

/ r2 31(2 • 32 • 31 2 logn)\ 
-2~32- j < 


The inequality in second line holds because k = l?(log 3 n). We get that at the 
end of round r e , the number of non-faulty nodes in a clique is at least (30£;/32) 

with probability at least 1-|r- By a union bound, this holds for every clique 

with probability at least 1 — -jq. □ 


Theorem 


Alg. 2 completes full information spreading on G Ut k in O (^ log J n) 


rounds, for any node failure probability per round q, 0 < q < O ( n lo g 3 n . w.h.p. 


Proof. Fix i,p. Let m v be a message that is fresh in at least T (non-faulty) 
nodes in C/_i at the end of shuffle phase p — 1. Here we analyze the probability 
that m v is not shuffled successfully in clique Ci. An unsuccessful shuffle might 
occur either because the phasecnt values in Ci at the end of shuffle phase p are 
smaller than the threshold of T* = cT , so the message is filtered out (denote this 
event by A), or because the message was selected by less than T (non-faulty) 
nodes. By |Lcnmia T8l at the beginning of shuffle phase p, the message m v is 
supposed to be fresh in at least T nodes in Ci (each of them gets the message 
from its respective neighbor in C,_i )■ Of these nodes in Ci, if one does not send 
m v during shuffle phase p, then either the node or its neighbor in C/_i (or both) 
becomes faulty by the end of shuffle phase p. The probability, q, for such a pair 
of nodes not to fail is bounded from below (according to Bernoulli’s inequality) 
by q = ((1 — q) T ‘) 2 > (1 — qr e ) 2 > 1 — 2 qr e > 1 — 1/16. Fix a set of T pairs of 
nodes S(m v ) C C/_i x Ci, of those who know message m v in C,_i at the end of 
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shuffle phase p — 1, and their respective neighbors in Ci . There might exist more 
than T such pairs, but by fixing a set of size T and ignoring the rest, we bound 
the probability of an unsuccessful shuffle from above, as the ignored nodes can 
only help and increase the probability of success. A “surviving” pair is a pair 
of nodes from S[m v ) where both are non-faulty at the end of the shuffle phase, 
and hence function properly (by sending message m v ) during shuffle phase p. 
Denote by s, the number of “surviving” pairs. We have that 


T —1 


Pr[A] < £ 


s=0 
T* — l 


T -1 


s—0 


^ E 


s—0 


T 


(<?) s • (i - q) T ~ s < E 

( 1 / 16 ) T_S . 


(i-qy- s < 


We sum over all s € {0,... ,T* — 1}, where the number of “survivors” is lower 
than the threshold of cT, which implies that the message m v is filtered out, 
improperly, at the end of the shuffle phase due to a low phasecnt value. By 
setting 0 < c < \, we get the following, 


Pr[A] < T* ■ • (1/16) T / 2 < cT ■ ■ (1/16) T / 2 

< Tj2 ■ ^(2e)s^ • (1/16) t / 2 < ^-alogri • ^(2e)^ 


< 


< n 


/ 2 \ fl log n ( 1 

( 2 ‘) ■(* 


< n-n * a •( —? 

n 4 


cx. log n / 

¥ 

< ni Q+1 


i a log n 


< 


i/3—l ' 


Namely, the message is not filtered out with probability at least 


i/3-l 


The 


number of non-fault y nodes in ea ch clique is at least 31 A:/32 with probability 
1 Lemma 4.2 An analysis similar to the one in the proof 


at least 1 — 


by 


of Lemma (Off (with 6 = 11/15) gives that, once the message is not filtered out, 


it is selected by at least T of the non-faulty nodes in Ci with probability at 

least 1- 11 i n/ 1 (1516) . In total, by using a union bound, a message is not shuffled 

successfully between two consecutive shuffle phases with probability at most 

l 


=03- 


+ 


+ (for value of a fixed earlier). 


*/(15-16) 

We use union bound two more times, for all messages and for all phases, 
and get an upper bound for the probability that a message is not propagated 
properly, of ^. This proves that the algorithm tolerates failures that occur with 
probability 0 < q < in the given model, with probability at least 1 — ^. □ 













